405 research outputs found

    Benchmarking High Performance Architectures With Natural Language Processing Algorithms

    Get PDF
    Natural Language Processing algorithms are resource demanding, especially when tuning toinflective language like Polish is needed. The paper presents time and memory requirementsof part of speech tagging and clustering algorithms applied to two corpora of the Polishlanguage. The algorithms are benchmarked on three high performance platforms of differentarchitectures. Additionally sequential versions and OpenMP implementations of clusteringalgorithms were compared

    Mobile Social Networks For Live Meetings

    Get PDF
    In this article, we present an idea of combining social networking websites andmodern mobile devices abilities to transfer social networking activity to a higherlevel. Nowadays, these devices and websites are used to offer ability of remotecommunication (phone calls, message exchange etc.), which potentially can beused to notify people about meetings in the real world. Since the current socialnetwork models do not provide enough information for such notification (socialnetworking websites are examples of social networks) a new social network modelthat will be suitable for the above mentioned application is proposed and a newsocial platform that base on mobile devices is introduced. This platform cannotify users when their friends are nearby. The paper presents the model andthe simulation that verifies the approach

    Comparison of Latent Semantic Analysis and Probabilistic Latent Semantic Analysis for Documents Clustering

    Get PDF
    In this paper we compare usefulness of statistical techniques of dimensionality reduction for improving clustering of documents in Polish. We start with partitional and agglomerative algorithms applied to Vector Space Model. Then we investigate two transformations: Latent Semantic Analysis and Probabilistic Latent Semantic Analysis. The obtained results showed advantage of Latent Semantic Analysis technique over probabilistic model. We also analyse time and memory consumption aspects of these transformations and present runtime details for IBM BladeCenter HS21 machine

    Comparison of Information Representation Formalisms for Scalable File Agnostic Information Infrastructures

    Get PDF
    In the early days of computing, files where just a natural way of storing information -- which reflected the way one would file their punch cards in a cabinet drawer. Unfortunately, the requirement to fragment information into such chunks, is a huge bottleneck for the evolution of global information space that the Internet has become. The concept of file causes several problems including unnatural clustering of information, unnecessary replication of data and very expensive information discovery in distributed computing environments. The overall goal of this work is to design an architecture enabling new era in computing and networking -- a computing infrastructure without the concept of file. Files are seen by many specialists as one of the main bottlenecks of modern IT systems evolution. This is mostly due to a very unnatural fragmentation of information into chunks which are easier to manage by operating systems but much more difficult for information processing tools and eventually by humans themselves

    A Case Study of Algorithms for Morphosyntactic Tagging of Polish Language

    Get PDF
    The paper presents an evaluation of several part-of-speech taggers, representing main tagging algorithms, applied to corpus of frequency dictionary of the contemporary Polish language. We report our results considering two tagging schemes: IPI PAN positional tagset and its simplified version. Tagging accuracy is calculated for different training sets and takes into account many subcategories (accuracy on known and unknown tokens, word segments, sentences etc.) The comparison of results with other inflecting and analytic languages is done. Performance aspects (time demands) of used tagging tools are also discussed

    Resource Storage Management Model for Ensuring Quality of Service in the Cloud Archive Systems

    Get PDF
    Nowadays, service providers offer a lot of IT services in the public or private cloud. The client can buy various kinds of services like SaaS, PaaS, etc. Recently there was introduced Backup as a Service (BaaS) as a variety of SaaS. At the moment there are available several different BaaSes for archiving the data in the cloud, but they provide only a basic level of service quality. In the paper we propose a model which ensures QoS for BaaS and some  methods for management of storage resources aimed at achieving the required SLA. This model introduces a set of parameters responsible for SLA level which can be offered on the basic or higher level of quality. The storage systems (typically HSM), which are distributed between several Data Centres,  are built based on disk arrays, VTLs, and tape libraries. The RSMM model does not assume bandwidth reservation or control, but is rather focused on the management of storage resources

    Increasing Quality of the Corpus of Frequency Dictionary of Contemporary Polish for Morphosyntactic Tagging of the Polish Language

    Get PDF
    The paper is devoted to the issue of correction of the erroneous and ambiguous corpus of Frequency Dictionary of Contemporary Polish (FDCP) and its application to morphosyntactic tagging of the Polish language. Several stages of corpus transformation are presented and baseline part-of-speech tagging algorithms are evaluated, too

    Modelling Agents Cooperation Through Internal Visions of Social Network and Episodic Memory

    Get PDF
    Human societies appear in many types of simulations. Particularly, a lot of new computer games contain a virtual world that imitates the real world. A few of the most important and the most difficult society elements to be modelled are the social context and individuals cooperation. In this paper we show how the social context and cooperation ability can be provided using agents that are equipped with internal visions of mutual social relations. Internal vision is a representation of social relations from the agent's point of view so, due to being subjective, it may be inconsistent with the reality. We introduce the agent model and the mechanism of rebuilding the agent's internal vision that is similar to that used by humans. An experimental proof of concept implementation is also presented

    Application of Weighted Voting Taggers to Languages Described with Large Tagsets

    Get PDF
    The paper presents baseline and complex part-of-speech taggers applied to the modified corpus of Frequency Dictionary of Contemporary Polish, annotated with a large tagset. First, the paper examines accuracy of 6 baseline part-of-speech taggers. The main part of the work presents simple weighted voting and complex voting taggers. Special attention is paid to lexical voting methods and issues of ties and fallbacks. TagPair and WPDV voting methods achieve the top accuracy among all considered methods. Error reduction 10.8 % with respect to the best baseline tagger for the large tagset is comparable with other author's results for small tagsets

    Database Replication for Disconnected Operations with Quasi Real-Time Synchronization

    Get PDF
    Database replication is a way to improve system throughput or achieve high availability. In most cases, using an active-active replica architecture is efficient and easy to deploy. Such a system has CP properties (from the CAP theorem: Consistency, Availability and network Partition tolerance). Creating an AP (available and partition tolerant) system requires using multi-primary replication. This approach, because of many difficulties in implementation, is not widely used. However, deployment of CCDB (experiment conditions and calibration database) needs to be an AP system in two locations. This necessity became an inspiration to examine the state-of-the-art in this field and to test the available solutions. The tests performed evaluate the performance of the chosen replication tools: Bucardo and EDB Replication Server. They show that the tested tools can be successfully used for continuous synchronization of two independent database instances
    corecore